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Background: Many attempts have been made to resolve in time the folding of model proteins in 
computer simulations. Different computational approaches have emerged. Some of these approaches 
suffer from the insensitivity to the geometrical properties of the proteins (lattice models), while 
^ , others are computationally heavy (traditional MD). 

Results: We use a recently-proposed approach of Zhou and Karplus to study the folding of the 
protein model based on the discrete time molecular dynamics algorithm. We show that this algorithm 
resolves with respect to time the folding ^ unfolding transition. In addition, we demonstrate the 
ability to study the coreof the model protein. 

Conclusion: The algorithm along with the model of inter-residue interactions can serve as a tool 
to study the thermodynamics and kinetics of protein models. 



S ■ T. INTRODUCTION 

The vast dimensionality of the protein conformational space R] makes the folding time too long to be reachable 
by direct computational approaches Simplified models [ppl4| became popular due to their ability to reach 

reasonable time scales and to reproduce the basic thermodynamic and kinetic properties of real proteins p| p^Jl^ ]: (i) 
unique native state, i.e. there should exist a single conformation with the lowest potential energy; (ii) cooperative 
folding transition (resembling first order transition); (Hi) thermodynamical stability of the native state; (iv) kinetic 
. accessibility, i.e. the native state should be reachable in a biologically reasonable time ]I^ , |l7| ]. 
Q\ ' Monte Carlo (MC) simulations on the lattices (see, e.g., Q-Q] and references therein) appear to be useful for studying 
theoretical aspects of protein folding. The Monte Carlo algorithm is based on a set of rules for the transition from one 
conformation to another. These transitions are weighted by some transition matrix, which reflects the phenomena 
under study. The simplicity of the algorithm and a significantly small conformational space of the protein models (due 
to the lattice constraints) make MC on-lattice simulations a powerful tool for studying the equilibrium dynamics of the 
protein models. However, lattice models impose strong constraints on the angles between the covalent bonds, thereby 
greatly restricting the conformational space of the protein-like model. The additional drawback of this restriction 
lies in the poor capability of these models to discern the geometrical properties of the proteins. The time in MC 
algorithms is estimated as the average number of moves (over an ensemble of the folding ^ unfolding transitions) 
made by a model protein. It was pointed out that MC simulations are equivalent to the solution of the master 
equation for the dynamics, so there is a relation between physical time and computer time, which is counted as the 
number of MC steps. However, a number of delicate issues — such as the dependence of the dynamics on the set of 
allowed MC moves — remain outstanding, so an independent test of the dynamics using the MD approach is needed. 

To address the questions sensitive to geometrical details, it is useful to study off-lattice models of protein folding. 
Thus far, several off-lattice simulations have been performed p9|-pl[|, which demonstrate the ability of the simplified 
models to study protein folding. 

Here, we study the 3-dimensional molecular dynamics of a simplified model of proteins (^^]. The potential of 
interaction between pairs of residues is modeled by a "square- well" , which allows us to increase the speed of the 
simulations (2^j2^]. We estimate folding time based on the collision event list, which besides increasing the speed of 
the simulation, allows for the tracking of "realistic" (not discretized) time. We show that such an algorithm can be 
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a useful compromise between computationally heavy traditional MD and fast, but restrictive MC. We demonstrate 
that model protein reproduces the principal features of folding phenomena (i) - (iv) described above. 

We also address the question of whether we can study the equilibrium properties of the core. The core is a small 
subset of the residues, which maintains the backbone of the structure at temperatures close to the folding transition 
temperature (here the -temperature Tg ). We emphasize the difference between the core and the nucleus of a protein: 
while the core is a persistent part of the structure at equilibrium, the nucleus is a fragment of this structure, which 
is assembled in the transition state (TS) — the folding ^ unfolding barrier (see Fig. 1 in 0). Based on simple 
arguments, we estimate Tg p4] for our model, and compare it with the value found in the simulations. 



II. THE MODEL 



We study a "beads on a string" model of a protein. We model the residues as hard spheres of unit mass. The 
potential of interaction between residues is "square- well" . We follow the Go model , where the attractive potential 
between residues is assigned to the pairs that are in contact (Ay, defined below) in the native state and repulsive 
potential is assigned to the pairs that are not in contact in the native state. Thus, the potential energy 
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where i and j denote residues i and j. Uij is the matrix of pair interactions 

+oo, \n -Tj\ < a 

Uij = { -sign(A y -)e, a < \r t - r,-| < at (2) 
0, \n -Tj\ > ai . 

Here ao/2 is a radius of the hard sphere, and ai/2 is the radius of the attractive sphere (Fig. |l|a) and e sets the energy 
scale. ||A|| is a matrix of contacts with elements 

A -J 1 ' \r? S -rf s \< ai 

Ay ' = \-1, |rf s -rf s |> ai , W 

where r^ s is the position of the i th residue when the protein is in the native conformation. Note, that we penalize 
the non-native contacts by imposing Ay- < 0. The parameters are chosen as follows: e = 1, ao = 9.8 and a\ = 19.5. 
The covalent bonds are also modeled by a square- well potential (Bellemans' bonds): 

V = i °' b ° < ' r ' ~ < bl (4) 
M+1 \ +oo, \n - r i+x \ < b , or |r, - r i+1 \ > b x . 

The values of b = 9.9 and &i = 10.1 are chosen so that average covalent bond length is equal to 10 (See Fig. [l]b). The 
original configuration of the protein (N = 65 residues) was designed by collapse of a homopolymer at low temperature 
p0| , p5|j26| . It contains n* = 328 native contacts, so £ns = —328. The 65 by 65 matrix of contacts of the globule in 
the native state is shown in Fig. ^|a. Note that the large number of native contacts (328/65 « 5 contacts per residue) 
is due to the choice of the parameter: a\ « 2ao — so that residues are able to establish contacts with the residues in 
the second neighboring shell. The radius of gyration of the globule in the native state is Rq « 22.7. The snapshot of 
the globule in the native state is shown in Fig. ||b. 

The program employs the discrete MD algorithm, which is based on the collision list, and is similar to one recently 
used by Zhou et al p2| to study equilibrium thermodynamics of homopolymers and by Zhou and Karplus |23f| to 
study equilibrium thermodynamics of folding of model of Staphylococcus aureus protein A. The detailed description 
of the algorithm can be found in |2?]-|30| . To control the temperature of the protein we introduce 935 particles, which 
do not interact with protein or with each other in any way but via regular collisions, serving as a heat bath. Thus, 
by changing the kinetic energy of those "ghost" particles we arc able to control the temperature of the environment. 
The "ghost" particles are hard spheres of the same radii as the chain residues and have unit mass. Temperature is 
measured in units of e/fc^- The time unit (tu) is estimated from the shortest time between two consequent collisions 
in the system between any two particles. 



2 



III. RESULTS 



In order to study the thermodynamics, we perform MD simulations of the chain at various temperatures. We start 
with the globule in the native state at temperature T — 0.1 and then raise the temperature of the heat bath to the 
desired one. Then we allow the system to equilibrate. At the final temperature, we let the protein relax for 10 6 time 
units. The typical behavior of the energy £ and the radius of gyration Rq as functions of time is shown in Fig. ^ for 
three different temperatures. 

In the present model the non- native contacts (NNC) are penalized (i.e., the pairwise interaction between NNC is 
repulsive^]), so their number increases as the temperature increases. At high temperatures (above Tg), however, the 
number of NNC varies only due to the random motion of the ideal chain and, thus, on average their number should 
be constant at different temperatures. The maximal number of NNC occurs at Tg and does not exceed 35, which is 
roughly 10% of the total number of native contacts (NC). 

The simulations reveal that the protein undergoes a folding ^ unfolding transition as we increase the temperature 
to the proximity of the 0-temperature Tg, which in this model is Tg = Tf ss 1.46. At Tg the distribution of energy has 
three peaks (Fig. ||a). The left peak corresponds to the folded state, the right peak corresponds to the unfolded state, 
and the middle one corresponds to the partially folded state (PFS), with 19-residue unfolded tail. This trimodality of 
the energy distribution is also seen in Fig. |b. The energy profile at temperature T = 1.42 (close to Tg) also reflects 
these three states. Since T <Tg, only two states are mostly present on Fig. |b. Thus, the energy distribution has only 
two peaks (Fig. ||), corresponding to the folded state and the PFS. Above Tg, the globule starts to explore energetic 
wells other than the native well (see Fig. 13 in |5l]]). 

To show that PFS is the cause of the middle peak in energy distribution (Fig. ||a), we eliminate the 19-residue tail 
and plot for the 46-mer the energy distribution its 6-temperature Tg = 1.44 (Fig. ^|). We expect to see only two 
states: folded and unfolded, since the 19-residue tail, which is the cause of PFS, is eliminated. Fig. || confirms our 
expectations. 

The folding unfolding transition is further quantified in Fig. |t]. The energy and the radius of gyration increase 
most rapidly near Tf = Tg resembling the order parameter jump in a phase transition (see discussion below). This 
rapid increase of £ and R g reaches its maximum at 0-point, where the potential of interaction is compensated by the 
thermal motion of the particles. Above Tg, interactions between residues do not hold them together any more and 
the chain becomes unfolded (see Fig. ||a). Note, that since all the attractive interactions are specific, the transition is 
described by one temperature Tf. 

The presence of the PFS is observed in the temperature range between 1.40 and 1.48, in which the collapse transition 
occurs. Thus, in this particular region the folding temperature and the ^-temperature are indistinguishable within 
the accuracy of their definitions. 

Remarkably, a simple Flory type model of an excluded volume chain predicts Tg within 20%. To demonstrate this, 
let us write the probability that the end-to-end distance of the chain is R [^4| : 

P(i?)ocp(i?)exp(-^-ffl), (5) 

where v = (47r/3)(ao/2) 3 is the volume of the monomer and p(R) oc R 2 exp(— 3i? 2 /(2A(a /2) 2 )) is the probability 
that the end-to-end distance of the chain is R for the random walk model. For T = Tf, the repulsive excluded volume 
term -(N 2 v)/(2R 3 ) balances the attractive term -£{R)/T > 0. Thus, 

T/ -iv^r~ L7 ' (6) 

where E ~ —130 and R ~ 24 are taken for a certain configuration at the 0-point. 
We also compute the heat capacity Cy from the relation p2| : 

o-azp.. ( 7) 

where ((5£) 2 ) = (£ 2 ) — (£) 2 and (...) denotes a time average. The time average is computed over 10 6 tu of equilibration 
at a fixed temperature. The dependence of the heat capacity on temperature is shown in Fig. |^b. There is a pronounced 
peak of C V (T) for T = T f . 



1 This corresponds to g = 2 of the Ref. 
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We note that below the folding temperature Tf the globule (Fig. &>) spends time in the state structurally similar 
to the native state (Fig. ||b). However, one can see that even though the globule maintains approximately the same 
structure, i.e. the same set of NC, the distances between residues are much larger than in the native state. Due to the 
fact that the potential of interaction between like residues is a square-well, there is no penalty for these residues to 
be maximally separated, yet they remain within the range of attractive interaction. This allows the globule to have 
more NNC and, thus, still maintain its similar to native structure, yet to have energy larger than the energy of the 
native state. This structure can be identified as the highest in energy, which still maintains its core. As temperature 
increases, the ratio \Rg — Rg S \/^-g S increases until temperature reaches T — Tf, where the ratio becomes roughly 
0.87. 

To confirm the presence of the core, we calculate / = N^c/Nc, at temperatures below T — Tf. The attractive 
inter-residue interaction term —£/T dominates the excluded volume repulsion term ~N 2 v/(2R 3 ) (see Eq. (^)), so 

£ n2v „ 

The total energy £ has contributions from both NC and NNC contacts, so 

£ = -e{N NC -N NNC ) = -[2f-l}eN c . (9) 

At a temperature slightly below Tf, \T — Tf\/Tf « 0.3, the residues are maximally separated within their potential 
wells, yet they still maintain contacts. Therefore, the volume v spanned by one residue is roughly v « (47r/3)(ai/2) 3 = 
8v. Nc is the product of the probability v/R 3 of having a bond (NC or NNC) and the total number of possible 
arrangements of the pair contacts between N residues, N{N - l)/2 « TV 2 / 2 - Tnus > 



From Eqs. (|8j) - (10) we can estimate /, the fraction of Nc at the temperature T « 1.42 < Tf. 



/>- + --« 0.68. (11) 

2 v e 

Due to the fact that the globule maintains roughly the same volume at temperatures slightly below 0-point, 
Eq. ( |TT| ) implies that approximately 70% of all native contacts stay intact in the folded phase (see Fig. ||). This 
result is supported by the simulations: at T s» 1.42 the number of NNC is roughly iVjvjyc ~ 28, and the energy £ 
is £ = —206. Therefore, the number of NC is Nnc ~ 234, and the fraction of NC is / ss 0.89, which is even higher 
than the lower limit set by Eq. (|ll|). Note that at a temperature higher than Tf, the fraction of native contacts 
becomes small due to the fact that in this regime the interactions are dominated by the excluded volume repulsion. 
This change in the number of NC from 70% to close to zero indicates the presence of the core structure maintained by 
these 70% of NC (see Fig. ||b and discussion below). Above the O-point the globule is completely unfolded (Fig. p |a). 

The formation of a specific nucleus during the folding transition was suggested by many theoretical d^,|ll|,|l^,^3p|3g] 
and experimental works [0-0 • The presence of the core at Tf may arrise from a nucleation processe driving the 
system from the unfolded state to the native state. We find indication of a first order transition. We also offer 
theoretical reasoning for the presence of a core (Eq. (|TT1)), which might indicate the presence of a nucleus. Next, we 
identify the core. 

We calculate the mean square displacement <j(T) of the globule at a certain temperature from a globule at the 
native state, i.e. 

1 N ~ ~ 1 N 

a(T) ee ([1 £(rf S - U (T n)f] 1/2 ) = £ ^ T )] 1/2 > • ( 12 ) 

i=l i=l 

where ?i and are the coordinates of the residues of the globules at two conformations: at some conformation at 
the temperature T and native conformation respectively. T is a translation matrix, which sets the centers of mass of 
these configurations at the same point in space. TZ is a rotation matrix, which minimizes the relative distance between 
the residues of two configurations (for details see ^-Q). The <Ji{T) in Eq. ([l2]) are the rms displacements for each 
individual residue. 

The plot of (cTj(T)) is presented in Fig. ||a. From the roughness of the "landscape" in Fig. ||a, we can select a 
group of residues whose rms displacements are significantly smaller than the rms displacements of the other group of 
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residues. We denote the former group by "cold" residues and the latter group by "hot" residues. The rms displacement 
strongly depends on the temperature near the folding transition and grows slowly below Tf. Note that the average 
numbers of NC of the residues are correlated with the average rms displacement of these residues, i.e. the peaks on 
the Nxc,i isothermal lines of Fig. |^b correspond to the "cold" residues. 

Next, we calculate the rms displacement <Jc{T) for the selected 25% coldest residues (the core) and (Jo{T) for the 
rest of the residues. Fig. [To| shows their dependence on temperature, as well as the dependence of the rms displacement 
for all residues c(T). There is a pronounced difference in the behavior of the rms displacement of the core residues and 
the rest of the residues below Tf. At Tf their behavior is the same, due to the fact that all the attractive interactions 
are balanced by the repulsion of the excluded volume. Above Tf the difference between <rc{T) and <Jo{T) is only due 
to the fact that the core residues have most of the NC and, therefore, are more likely to spend time together even at 
T>T f . 

To study the behavior of the globule at Tf, we subdivide the probability distribution of the energy states £ of 
the globule maintained at Tf = 1.46 during 10 6 tu into five regions: A, B, C, D, and E (see Fig. |ll|a). Re gion A 
corresponds to the folded state; region B corresponds to the transitional state between folded state and PFS; region C 
corresponds to the PFS; region D corresponds to the transitional state between PFS and completely unfolded state. 
Next we plot the rms displacement for each residue for each of the above regions (see Fig. fll^i). Note, that in region 
A all residues stay in contact; in region C both N- and C-termini tails break away, forming PFS; in region D, there 
are only a few core residues that still stay intact; and in region E none of the residues is in contact. In region B, we 
observe that part of the C-terminus tail residues are not in contact, indicating the formation of a PFS. Next, we plot 
the dependence of the selected 11 core residues (see caption to Fig. |J) on the average energy of the window of the 
corresponding region (see Fig. 11;). We observe that core residues remain close to one another even in the second 
transitional state D between the PFS and completely unfolded state. 

We also study the system by cooling it from the high temperature state. This technique corresponds to the simulated 
annealing, due to the fact that the temperature control is governed by the ghost particles that are present in the 
system. We find that if the target temperature is above 1.1 the globule always reaches the state corresponding to 
native state. However, if the target temperature is 0.96, the globule reaches the state, corresponding the native state 
only in ss 70% of the cases, in the time interval of 10 5 time units. As an example we demonstrate on Fig. [l^ the 
cooling of the model protein from the high temperature state T = 3.0 to the low temperature state T = 0.1. The 
model protein collapses after 1200 tu. 

What is particularly remarkable about Fig. 12 is that we can follow the kinetics of the collapse. First, the globule 



gets trapped in some misfolded conformation, where it stays for about 1000 tu (see Fig. |l2p,), and then it collapses to 
the native state. The time behavior of the energy, however, can look a bit puzzling. After the rms displacement drops 
to close to 0, indicating the native state, the energy is still higher than that of the native state for about 10 4 tu (see 
Fig. p"2|b) . The key to resolve this puzzle is the fact that after the collapse of the model protein its potential energy 
transforms to kinetic energy, which slowly decreases by thermal equilibration with the bath of the ghost particles. 



IV. DISCUSSION 



We find that the classical model of the self-avoiding chain with excluded volume shows good agreement with the 
simulations. We show from simple arguments and simulations that the fraction of NC at the folding temperature Tf 
is larger than 70%, consistent with the presence of the core. The nucleus forms in the unstable transition state. From 
the transition state the globule jumps either to the completely unfolded conformation or to the folded conformation. 

Our simulations are in agreement with the recent work of Zhou and Karplus p3| . They performed discrete molecular 
dynamics simulations of Staphylococcus aureus protein A, the inter-residue interactions of which were modeled based 
on Go model |-|. The pair residues of model protein, which form native contacts, had "square-well" potential 
of interaction with the depth of the well equal to B^e, while all other pair residues had "square-well" potential of 
interaction with the depth of the well equal to Boe- They characterized the difference between NC and NNC by the 
"bias gap", g: g = 1 — Bo/B^- Zhou and Karplus found that when g = 1.3, i. e. when the interaction between NC 
is of the opposite sign to the interaction between NNC, there is a strong first-order-like transition from the random 
coil to the ordered globule. The case with our globule corresponds to g — 2, where, according to the work of Zhou 
and Karplus there should exist a strong first-order-like transition from the random coil to the ordered globule without 
intermediate. 

We also select the core residues and show that their rms displacement behaves significantly differently than the 
behavior of the rms displacement of the rest of the residues and exhibits step- function like behavior upon the change of 
temperature. Our findings are in agreement with the recent experimental study of the equilibrium hydrogen exchange 
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behavior of cytochrome c of Bai et al. j3£| , who investigated the exposure of the amide hydrogens (NH) in cytochrome 
c to solvent (due to local and global unfolding fluctuations). The experiments were based on the properties of the 
amide hydrogens that are involved in hydrogen-bonded structure and can exchange with solvent hydrogens. Bai et 
al. demonstrated that proteins undergo folding ^ unfolding transition "...through intermediate forms". They also 
selected these intermediate forms (cooperative units), which are 15 to 25 residues in size. The presence of PFS in our 
simulations is thus in agreement with the findings of Bai et al. of the intermediate forms in cytochrome c. 

The relation between core residues that we find and the nucleus is hard to establish due to the fact that TS is 
very unstable. Recent amide hydrogen exchange experiments on CheY protein from Escherichia coli of Lacroix et al. 

provided the evidence for the residues involved in the folding nucleus. Furthermore, the lattice MC simulations 
of Abkevich et al. |0] also demonstrate that the presence of the nucleus is a necessary and sufficient condition for 
subsequent rapid folding to the native state. The crucial difference between the nucleus and the rest of the structure 
is in dynamics, which is manifest also in equilibrium fluctuations. All local unfolding fluctuations (i.e. the ones after 
which the chain returns rapidly back to the native state) keep the nucleus intact, while fluctuations that disrupt the 
nucleus lead to global unfolding: "descend" to the "unfolded" free energy minimum fljll|]. This view is consistent 
with the hydrogen exchange experiments |59],(l0| . Such behavior of the globule is consistent with a possible first order 
phase transition in a system of finite size. 
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FIG. 1. The potential of interaction between (a) specific residues; (b) neighboring residues (covalent bond), ao is the diameter 
of the hard sphere and a 1 is the diameter of the attractive sphere. [60, &i] is the interval where residues that are neighbors on 
the chain can move freely. 




FIG. 2. (a) 65 x 65 contact matrix of the model protein in the native state. Black boxes indicate the matrix elements of 
those residue pairs which have a contact (their relative distance is between ao and ai). (b) The snapshot of the protein of 65 
residues in the native state obtained at temperature T = 0.1. 
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FIG. 3. The dependence on time of (a) energy £ and (b) radius of gyration Rg- The globule is maintained at three different 
temperatures T = 0.78 < Tf, T = 1.42, and T = 1.63 > Tf for 10 B tu. For T = 0.78, the fluctuations of both energy £ and 
Rg are small, i.e. the globule is found in one folded configuration. At high temperatures (T = 1.63) the fluctuations off and 
Rg are large; the globule is mostly found in the unfolded state. At the temperature T = 1.42, which is close to Tf, the globule 
is mostly present in two states. The lower energy configuration corresponds to the folded state: the globule is compact (see 
(b)). The other configuration has large fluctuations: the globule is in the PFS. There is an additional state - the unfolded 
state (see (b)). At T = 1.42 the protein model is rarely present in the unfolded state. Thus, the behavior of the globule at the 
temperatures close to Tf indicates the presence of three distinct states: folded, unfolded and PFS. 
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FIG. 4. The probability distribution of (a) the energy states £ and (b) the radius of gyration Rq of the globule maintained at 
Tf — 1.46 for 10 6 tu. The trimodal distributions indicate the presence of three states: the folded state, PFS, and the unfolded 
state. 
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FIG. 5. The probability distribution of the energy states £ of the globule maintained at three different temperatures: 
T = 1.25, 1.42, and 1.73. Note, that at T = 1.42 w T/ the distribution has two expressed peaks. The right peak of this 
(T = 1.42) distribution corresponds to the PFS, while the left peak corresponds to the energetic well of the native state. 
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FIG. 6. The probability distribution of the energy states £ of the 46-residue globule maintained at Tf — 1.44 during 10 tu. 
The bimodal distributions of energy indicates that the 19-residue tail is responsible for the PFS of the 65-residue globule: after 
eliminating the 19 residue tail the trimodal energy distribution of the 65-residue globule becomes bimodal energy distribution 
of the 46-residue globule. 
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FIG. 7. The dependence on temperature of (a) the energy £ , (b) the heat capacity CV, and (c) the radius of gyration Rg- 
The error bars are the standard deviation of fluctuations. The rapid increase of energy as well as the sharp peak in heat 
capacity at T = Tf indicates a first order phase transition. 
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FIG. 8. The snapshot of the protein in (a) the unfolded stated, obtained at high temperature T = 1.8; and (b) the transition 
state, obtained at folding transition temperature Tf = 1.46 (green), overlapped with the globule at low temperature T = 0.4 
(red). Note that the TS globule has a close visual similarity to those maintained at low temperature and in the native state 
(see also Fig. ^p)- It is more dispersed, however, which makes all the NC easily breakable. To compare the globule at the TS 
with the one maintained at temperature T = 0.4, we perform the transformation proposed by Kabsch to minimize the 
relative distance between the residues in the TS and the state at T = 0.4. The "cold" residues (grey spheres) denote residues 
whose rms displacement are smaller than a\. 
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FIG. 9. (a) The contour plot of the rms displacement cn(T) for each residue i = 0,1, 64 at temperatures T = 0.3, 0.97, 
1.34, 1.46 (bold line) and 1.54, averaged over 10 6 tu. Note that there is a distinct difference between the "cold" (small values 
of o"i(T)) and "hot" residues (large values of <7;(T)). The dashed dotted line indicates the breaking point of the NC, i.e. when 
<7i(T) is of the size of the average relative position between pairs of residues, i.e. &i(T) — (ao + a\)/2 ~ 15. The bold lines (on 
both (a) and (b)) indicate the folding transition temperature line Tf. It is worth noting that 11 residues are still in contact 
(marked by red circles on (a)): 16, 23, 24, 25, 26, 27, 28, 29, 37, 38, 39. (b) The analogous to (a) plot of the average number 
of NC for each residue. Note that the number of NC is strongly correlated with the rms: the local minima of the (ai(T)) plots 
correspond to the local maxima of the number of NC. 




FIG. 10. The dependence of the rms displacement of the core residues oc{T) (solid line), the rest of the residues ao(T) 
(dashed line) and all the residues cr(T) on temperature. The above quantities are averaged over 10 6 tu. Note, that for the ideal 
first order phase transition one would expect oc(T) to be a step function. However, since we consider a transition that would 
be first order in the limit of the infinite size, ac{T) exhibits only step-function like behavior. The difference between core 
residues and other residues is that at Tf the average rms displacement of the core residues is smaller than 15, which indicates 
that they are in contact (see caption to the Fig. |^). On the contrary, the average rms displacement of the non-core residues is 
greater than 15, indicating that these residues are not in contact. 
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FIG. 11. (a) The probability distribution of the energy states £ of the globule maintained at T/ = 1.46 during 10 6 tu. The 
probability distribution is divided into five regions: A, B, C, D, and E. Region A corresponds to the folded state; region B 
corresponds to the transitional state between folded state and PFS; region C corresponds to the PFS; region D corresponds to 
the transitional state between PFS and completely unfolded state, (b) The plot of the rms displacement Oi{T) for each residue 
i = 0,l, ...,64 for various regions A, B, C, D, and E of the plot (a) averaged over 10 6 tu. Note, that in region A all residues 
stay in contact; in region C both N- and C-terminus tails break away, forming PFS; in region D there are only a few of core 
residues are still stay intact; and in region E none of the residues are in contact, (c) The dependence of the rms displacement 
of the core residues (circles), the rest of the residues (squares) on the average energy S of the window of the corresponding 
region. Note, that core residues stay intact even in the second transitional state D between the PFS and completely unfolded 
state. The dashed dotted line in (b) and (c) indicates the breaking point of the NC (see Fig 0). 
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FIG. 12. The time evolution of the (a) rms displacement per residue of the globule from its native state and (b) energy 
when we cool the system from the high temperature (T = 3.0), unfolded state down to the low temperature (T = 0.1) state. 
The model protein gets trapped in the misfolded conformation after 200 tu and then proceeds to its native state after 1000 tu. 
Although the rms displacement of the globule from its native state is close to after 1200 tu the energy of the globule is 
higher than energy of the native state for about 10 4 tu. This effect is due to the thermal bath ghost particles which thermally 
equilibrate the system during t rB iax. ~ 10 4 tu relaxation time. 
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